Split-Apply-Combine

Guido Kraemer

Map-Reduce

Map

Map-Reduce

Split-Apply-Combine

Split

Ways to split an matrix

Ways to split a 3d-array

Wickham (2011)

Apply

\[f: n_1D\text{-array} \mapsto n_2D\text{-array}\]

e.g. scalar to vector, matrix to scalar, vector to vector

Combine

Wickham (2011)

Split-apply-combine

Mahecha et al. (2020)

Summary

  • Split-apply-combine is an extension of map-reduce to arrays
  • On data cubes the dimensions are named
  • Easy parallelization and memory efficient

Literature

Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1–29. https://doi.org/10.18637/jss.v040.i01

Mahecha, M. D., Gans, F., Brandt, G., Christiansen, R., Cornell, S. E., Fomferra, N., Kraemer, G., Peters, J., Bodesheim, P., Camps-Valls, G., Donges, J. F., Dorigo, W., Estupinan-Suarez, L. M., Gutierrez-Velez, V. H., Gutwin, M., Jung, M., Londoño, M. C., Miralles, D. G., Papastefanou, P., & Reichstein, M. (2020). Earth system data cubes unravel global multivariate dynamics. Earth System Dynamics, 11(1), 201–234. https://doi.org/10.5194/esd-11-201-2020